9 research outputs found

    Neural Paraphrase Identification of Questions with Noisy Pretraining

    Full text link
    We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (Parikh et al., 2016) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset

    New insights into cis-regulatory module evolution using in-silico evolutionary simulations

    Get PDF
    Gene regulation is the process by which specific sets of genes are expressed in precise spatial/temporal patterns (Davidson 2010). It is a fundamental process with impact on development and cell identity (Fisher 2002; Davidson 2010), cancer (Riggs and Jones 1983; Ballestar and Esteller 2008) and other diseases such as Alzheimer’s disease (van Duijn et al. 1999), and several other biological processes (Davidson 2010). Understanding gene regulation, and its evolution, is an important quest in biology and medicine, and it is one that is often addressed with the help of computational tools. In this thesis we present a suite of computational tools and statistical methods developed to simulate the evolution of gene regulatory sequences in a realistic setting. We also describe new insights into function, mechanisms and evolution of gene regulation that have been learned with the help of these tools. We first demonstrate the ability of our tools to model the evolution of regulatory sequences from 12 species of fruitflies. In our comparison with other available tools, we have been able to achieve better performances while using a smaller number of free parameters. Additionally, we describe three studies that provide new insights concerning the evolution and mechanism of the regulatory machinery. As the first relevant insight, we demonstrate that the phenomenon of homotypic clustering of transcription factor binding sites, which is often associated with mechanistic implications or origins (e.g., cooperative activation), may also be explained as an evolutionary artifact, or, in the language of (Lusk and Eisen 2010), an evolutionary mirage. Our second study demonstrates how the accurate modeling of evolutionary data for regulatory sequences can be used to elicit biophysical mechanisms of the regulatory machinery. Specifically, we demonstrate how discrepancies between our evolutionary model and real data pointed to a possible cooperative interaction between molecules of a transcription factor, which was then confirmed using biological essays. Finally we use our tool to explore questions related to the time necessary to evolve an enhancer under a diverse set of situations. We find that some enhancers are easier to evolve than others and that a number of factors, including biophysical mechanisms and the starting point for evolution will impact the time necessary to evolve regulatory sequences. The insights that we have been able to gain using our tools are relevant to biologists, but perhaps equally relevant is the fact that all these insights have been learned largely from one computational tool, which demonstrates the flexibility of our tool in particular, as well as the importance of computational biology approaches in general

    Evolutionary Origins of Transcription Factor Binding Site Clusters

    No full text
    Molecular clock analyses estimate that crown-group animals began diversifying hundreds of millions of years before the start of the Cambrian period. However, the fossil record has not yielded unequivocal evidence for animals during this interval. Some of the most promising candidates for Precambrian animals occur in the Weng'an biota of South China, including a suite of tubular fossils assigned to Sinocyclocyclicus, Ramitubus, Crassitubus and Quadratitubus, that have been interpreted as soft-bodied eumetazoans comparable to tabulate corals. Here, we present new insights into the anatomy, original composition and phylogenetic affinities of these taxa based on data from synchrotron radiation X-ray tomographic microscopy, ptychographic nanotomography, scanning electron microscopy and electron probe microanalysis. The patterns of deformation observed suggest that the cross walls of Sinocyclocyclicus and Quadratitubus were more rigid than those of Ramitubus and Crassitubus. Ramitubus and Crassitubus specimens preserve enigmatic cellular clusters at terminal positions in the tubes. Specimens of Sinocyclocyclicus and Ramitubus have biological features that might be cellular tissue or subcellular structures filling the spaces between the cross walls. These observations are incompatible with a cnidarian interpretation, in which the spaces between cross walls are abandoned parts of the former living positions of the polyp. The affinity of the Weng'an tubular fossils may lie within the algae
    corecore